value function factorization method
A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning
Sun, Wei-Fang, Lee, Cheng-Kuang, See, Simon, Lee, Chun-Yi
In fully cooperative multi-agent reinforcement learning (MARL) settings, environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of other agents. To address the above issues, we proposed a unified framework, called DFAC, for integrating distributional RL with value function factorization methods. This framework generalizes expected value function factorization methods to enable the factorization of return distributions. To validate DFAC, we first demonstrate its ability to factorize the value functions of a simple matrix game with stochastic rewards. Then, we perform experiments on all Super Hard maps of the StarCraft Multi-Agent Challenge and six self-designed Ultra Hard maps, showing that DFAC is able to outperform a number of baselines.
[ICML 2021 Spotlight] DFAC Framework: Factorizing the Value Function via Quantile Mixture for…
In multi-agent reinforcement learning (MARL), the environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of the other agents. One of popular research directions is to enhance the training procedure of fully cooperative and decentralized agents. In the past few years, a number of MARL researchers turned their attention to centralized training with decentralized execution (CTDE). Among these CTDE approaches, value function factorization methods are especially promising in terms of their superior performances and data efficiency. Value function factorization methods introduce the assumption of individual-global-max (IGM) [1], which assumes that each agent's optimal actions result in the optimal joint actions of the entire group. Based on IGM, the total return of a group of agents can be factorized into separate utility functions for each agent.
Learning Nearly Decomposable Value Functions Via Communication Minimization
Wang, Tonghan, Wang, Jianhao, Zheng, Chongyi, Zhang, Chongjie
Reinforcement learning encounters major challenges in multi-agent settings, such as scalability and non-stationarity. Recently, value function factorization learning emerges as a promising way to address these challenges in collaborative multi-agent systems. However, existing methods have been focusing on learning fully decentralized value function, which are not efficient for tasks requiring communication. To address this limitation, this paper presents a novel framework for learning nearly decomposable value functions with communication, with which agents act on their own most of the time but occasionally send messages to other agents in order for effective coordination. This framework hybridizes value function factorization learning and communication learning by introducing two information-theoretic regularizers. These regularizers are maximizing mutual information between decentralized Q functions and communication messages while minimizing the entropy of messages between agents. We show how to optimize these regularizers in a way that is easily integrated with existing value function factorization methods such as QMIX. Finally, we demonstrate that, on the StarCraft unit micromanagement benchmark, our framework significantly outperforms baseline methods and allows to cut off more than $80\%$ communication without sacrificing the performance. The video of our experiments is available at https://sites.google.com/view/ndvf.